Overview

Dataset statistics

Number of variables10
Number of observations500
Missing cells0
Missing cells (%)0.0%
Duplicate rows0
Duplicate rows (%)0.0%
Total size in memory39.2 KiB
Average record size in memory80.3 B

Variable types

NUM7
CAT3

Warnings

car name has a high cardinality: 77 distinct values High cardinality
id has unique values Unique
mpg has unique values Unique
acceleration has unique values Unique

Reproduction

Analysis started2020-11-01 15:29:24.417281
Analysis finished2020-11-01 15:29:47.396618
Duration22.98 seconds
Software versionpandas-profiling v2.9.0
Download configurationconfig.yaml

Variables

id
Real number (ℝ≥0)

UNIQUE

Distinct500
Distinct (%)100.0%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean500.176
Minimum0
Maximum997
Zeros1
Zeros (%)0.2%
Memory size3.9 KiB

Quantile statistics

Minimum0
5-th percentile56.95
Q1242.25
median513
Q3750.25
95-th percentile943.2
Maximum997
Range997
Interquartile range (IQR)508

Descriptive statistics

Standard deviation288.6571789
Coefficient of variation (CV)0.5771112147
Kurtosis-1.223066581
Mean500.176
Median Absolute Deviation (MAD)250.5
Skewness-0.03182781094
Sum250088
Variance83322.96696
MonotocityStrictly increasing
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%) 
99710.2%
 
34310.2%
 
32110.2%
 
32210.2%
 
32310.2%
 
32410.2%
 
32610.2%
 
32710.2%
 
32810.2%
 
32910.2%
 
Other values (490)49098.0%
 
ValueCountFrequency (%) 
010.2%
 
310.2%
 
410.2%
 
710.2%
 
910.2%
 
ValueCountFrequency (%) 
99710.2%
 
99510.2%
 
99410.2%
 
98310.2%
 
98110.2%
 

mpg
Real number (ℝ≥0)

UNIQUE

Distinct500
Distinct (%)100.0%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean27.01093994
Minimum15.78761298
Maximum44.7638971
Zeros0
Zeros (%)0.0%
Memory size3.9 KiB

Quantile statistics

Minimum15.78761298
5-th percentile17.20086215
Q122.39664058
median26.2289843
Q335.08833319
95-th percentile36.40464199
Maximum44.7638971
Range28.97628412
Interquartile range (IQR)12.69169261

Descriptive statistics

Standard deviation7.356248557
Coefficient of variation (CV)0.2723433014
Kurtosis-0.8799047176
Mean27.01093994
Median Absolute Deviation (MAD)7.479963315
Skewness0.3626768372
Sum13505.46997
Variance54.11439284
MonotocityNot monotonic
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%) 
23.294374110.2%
 
35.9448715210.2%
 
17.6133135810.2%
 
28.6983378410.2%
 
34.6677244310.2%
 
23.1672821510.2%
 
26.7191669510.2%
 
35.691820610.2%
 
22.810436410.2%
 
28.3803512110.2%
 
Other values (490)49098.0%
 
ValueCountFrequency (%) 
15.7876129810.2%
 
15.9913388510.2%
 
16.2751354110.2%
 
16.3038794910.2%
 
16.4039869110.2%
 
ValueCountFrequency (%) 
44.763897110.2%
 
44.718426910.2%
 
44.6800839410.2%
 
44.5339635410.2%
 
44.4596494910.2%
 

cylinders
Categorical

Distinct3
Distinct (%)0.6%
Missing0
Missing (%)0.0%
Memory size3.9 KiB
4
305 
8
103 
6
92 
ValueCountFrequency (%) 
430561.0%
 
810320.6%
 
69218.4%
 
Frequencies of value counts

Unique

Unique0 ?
Unique (%)0.0%
Histogram of lengths of the category

Length

Max length1
Median length1
Mean length1
Min length1

displacement
Real number (ℝ≥0)

Distinct43
Distinct (%)8.6%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean194.762
Minimum79
Maximum429
Zeros0
Zeros (%)0.0%
Memory size3.9 KiB

Quantile statistics

Minimum79
5-th percentile89
Q1104
median140
Q3302
95-th percentile400
Maximum429
Range350
Interquartile range (IQR)198

Descriptive statistics

Standard deviation106.2774253
Coefficient of variation (CV)0.5456784452
Kurtosis-0.7672025598
Mean194.762
Median Absolute Deviation (MAD)43
Skewness0.807167207
Sum97381
Variance11294.89114
MonotocityNot monotonic
Histogram with fixed size bins (bins=43)
ValueCountFrequency (%) 
1406513.0%
 
97418.2%
 
302387.6%
 
400306.0%
 
318275.4%
 
90224.4%
 
350204.0%
 
104204.0%
 
200193.8%
 
151163.2%
 
Other values (33)20240.4%
 
ValueCountFrequency (%) 
7910.2%
 
8040.8%
 
85102.0%
 
8820.4%
 
89112.2%
 
ValueCountFrequency (%) 
429153.0%
 
400306.0%
 
36020.4%
 
35120.4%
 
350204.0%
 

horsepower
Real number (ℝ≥0)

Distinct40
Distinct (%)8.0%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean106.8452138
Minimum54
Maximum220
Zeros0
Zeros (%)0.0%
Memory size3.9 KiB

Quantile statistics

Minimum54
5-th percentile61
Q185
median100
Q3130
95-th percentile150
Maximum220
Range166
Interquartile range (IQR)45

Descriptive statistics

Standard deviation35.27743567
Coefficient of variation (CV)0.330173289
Kurtosis1.216583084
Mean106.8452138
Median Absolute Deviation (MAD)23.5
Skewness1.042661291
Sum53422.60692
Variance1244.497467
MonotocityNot monotonic
Histogram with fixed size bins (bins=40)
ValueCountFrequency (%) 
85479.4%
 
150448.8%
 
97418.2%
 
110408.0%
 
67397.8%
 
100295.8%
 
90275.4%
 
148214.2%
 
60204.0%
 
71173.4%
 
Other values (30)17535.0%
 
ValueCountFrequency (%) 
5430.6%
 
5810.2%
 
60204.0%
 
6120.4%
 
6410.2%
 
ValueCountFrequency (%) 
220142.8%
 
19361.2%
 
16520.4%
 
150448.8%
 
148214.2%
 

weight
Real number (ℝ≥0)

Distinct79
Distinct (%)15.8%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean2719.714
Minimum1755
Maximum4732
Zeros0
Zeros (%)0.0%
Memory size3.9 KiB

Quantile statistics

Minimum1755
5-th percentile1875
Q12178.75
median2615
Q33193
95-th percentile4275.05
Maximum4732
Range2977
Interquartile range (IQR)1014.25

Descriptive statistics

Standard deviation717.0354104
Coefficient of variation (CV)0.2636436811
Kurtosis0.08925493619
Mean2719.714
Median Absolute Deviation (MAD)491
Skewness0.9191024558
Sum1359857
Variance514139.7798
MonotocityNot monotonic
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%) 
3193357.0%
 
2000316.2%
 
2300306.0%
 
3233244.8%
 
2774224.4%
 
1875224.4%
 
2200204.0%
 
2815163.2%
 
2123153.0%
 
2245142.8%
 
Other values (69)27154.2%
 
ValueCountFrequency (%) 
175520.4%
 
1760122.4%
 
1875224.4%
 
192510.2%
 
195530.6%
 
ValueCountFrequency (%) 
473230.6%
 
463810.2%
 
446410.2%
 
4456132.6%
 
437610.2%
 

acceleration
Real number (ℝ≥0)

UNIQUE

Distinct500
Distinct (%)100.0%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean15.3003277
Minimum9.530858797
Maximum21.92251057
Zeros0
Zeros (%)0.0%
Memory size3.9 KiB

Quantile statistics

Minimum9.530858797
5-th percentile13.04329295
Q113.44156163
median15.23192308
Q317.19053114
95-th percentile19.34364274
Maximum21.92251057
Range12.39165177
Interquartile range (IQR)3.74896951

Descriptive statistics

Standard deviation2.261096048
Coefficient of variation (CV)0.1477808902
Kurtosis0.516151278
Mean15.3003277
Median Absolute Deviation (MAD)1.85832857
Skewness0.2892277264
Sum7650.163849
Variance5.112555338
MonotocityNot monotonic
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%) 
14.8749883910.2%
 
15.0567605810.2%
 
13.3895728210.2%
 
13.7499105910.2%
 
13.0442774310.2%
 
15.3170621110.2%
 
15.4190245510.2%
 
14.9915553210.2%
 
17.8563331310.2%
 
9.71918376410.2%
 
Other values (490)49098.0%
 
ValueCountFrequency (%) 
9.53085879710.2%
 
9.55964132910.2%
 
9.5787894610.2%
 
9.59061736810.2%
 
9.62140037110.2%
 
ValueCountFrequency (%) 
21.9225105710.2%
 
21.8856881910.2%
 
21.7509368610.2%
 
21.6736863410.2%
 
21.6060063810.2%
 

model year
Real number (ℝ≥0)

Distinct13
Distinct (%)2.6%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean76.332
Minimum70
Maximum82
Zeros0
Zeros (%)0.0%
Memory size3.9 KiB

Quantile statistics

Minimum70
5-th percentile70
Q173
median76
Q380
95-th percentile82
Maximum82
Range12
Interquartile range (IQR)7

Descriptive statistics

Standard deviation3.909007121
Coefficient of variation (CV)0.05121059479
Kurtosis-1.335199548
Mean76.332
Median Absolute Deviation (MAD)4
Skewness-0.1839449657
Sum38166
Variance15.28033667
MonotocityNot monotonic
Histogram with fixed size bins (bins=13)
ValueCountFrequency (%) 
816312.6%
 
716212.4%
 
805210.4%
 
76489.6%
 
79438.6%
 
70357.0%
 
78346.8%
 
82336.6%
 
73326.4%
 
75295.8%
 
Other values (3)6913.8%
 
ValueCountFrequency (%) 
70357.0%
 
716212.4%
 
72193.8%
 
73326.4%
 
74275.4%
 
ValueCountFrequency (%) 
82336.6%
 
816312.6%
 
805210.4%
 
79438.6%
 
78346.8%
 

origin
Categorical

Distinct3
Distinct (%)0.6%
Missing0
Missing (%)0.0%
Memory size3.9 KiB
1
373 
3
83 
2
44 
ValueCountFrequency (%) 
137374.6%
 
38316.6%
 
2448.8%
 
Frequencies of value counts

Unique

Unique0 ?
Unique (%)0.0%
Histogram of lengths of the category

Length

Max length1
Median length1
Mean length1
Min length1

car name
Categorical

HIGH CARDINALITY

Distinct77
Distinct (%)15.4%
Missing0
Missing (%)0.0%
Memory size3.9 KiB
dodge monaco brougham
 
27
datsun 200sx
 
24
chevrolet nova
 
21
vw rabbit
 
18
pontiac astro
 
17
Other values (72)
393 
ValueCountFrequency (%) 
dodge monaco brougham275.4%
 
datsun 200sx244.8%
 
chevrolet nova214.2%
 
vw rabbit183.6%
 
pontiac astro173.4%
 
ford pinto173.4%
 
ford futura173.4%
 
honda civic 1300163.2%
 
dodge rampage153.0%
 
dodge aspen153.0%
 
Other values (67)31362.6%
 
Frequencies of value counts

Unique

Unique22 ?
Unique (%)4.4%
Histogram of lengths of the category

Length

Max length33
Median length14
Mean length14.77
Min length8

Interactions

Correlations

Pearson's r

The Pearson's correlation coefficient (r) is a measure of linear correlation between two variables. It's value lies between -1 and +1, -1 indicating total negative linear correlation, 0 indicating no linear correlation and 1 indicating total positive linear correlation. Furthermore, r is invariant under separate changes in location and scale of the two variables, implying that for a linear function the angle to the x-axis does not affect r.

To calculate r for two variables X and Y, one divides the covariance of X and Y by the product of their standard deviations.

Spearman's ρ

The Spearman's rank correlation coefficient (ρ) is a measure of monotonic correlation between two variables, and is therefore better in catching nonlinear monotonic correlations than Pearson's r. It's value lies between -1 and +1, -1 indicating total negative monotonic correlation, 0 indicating no monotonic correlation and 1 indicating total positive monotonic correlation.

To calculate ρ for two variables X and Y, one divides the covariance of the rank variables of X and Y by the product of their standard deviations.

Kendall's τ

Similarly to Spearman's rank correlation coefficient, the Kendall rank correlation coefficient (τ) measures ordinal association between two variables. It's value lies between -1 and +1, -1 indicating total negative correlation, 0 indicating no correlation and 1 indicating total positive correlation.

To calculate τ for two variables X and Y, one determines the number of concordant and discordant pairs of observations. τ is given by the number of concordant pairs minus the discordant pairs divided by the total number of pairs.

Phik (φk)

Phik (φk) is a new and practical correlation coefficient that works consistently between categorical, ordinal and interval variables, captures non-linear dependency and reverts to the Pearson correlation coefficient in case of a bivariate normal input distribution. There is extensive documentation available here.

Cramér's V (φc)

Cramér's V is an association measure for nominal random variables. The coefficient ranges from 0 to 1, with 0 indicating independence and 1 indicating perfect association. The empirical estimators used for Cramér's V have been proved to be biased, even for large samples. We use a bias-corrected measure that has been proposed by Bergsma in 2013 that can be found here.

Missing values

Sample

First rows

idmpgcylindersdisplacementhorsepowerweightaccelerationmodel yearorigincar name
0023.0597826140110.0281517.977429801dodge aspen
1317.6745218350150.0445613.514535721dodge rampage
2417.1363538302140.0277413.209912791mercury cougar brougham
3722.664666640085.0219015.196381711pontiac j2000 se hatchback
4917.8720188429220.022459.621400701ford galaxie 500
51123.4050076140110.0281518.152362801dodge aspen
61317.2502986318110.0320519.228868751vw rabbit custom
71635.4696764140165.0214513.519583821amc gremlin
81922.839820620085.0319317.215803711dodge monaco brougham
92336.489563410460.0200014.899884811datsun 200sx

Last rows

idmpgcylindersdisplacementhorsepowerweightaccelerationmodel yearorigincar name
49097422.490094620085.0319317.210477731dodge monaco brougham
49197636.22295849867.0200014.991555791fiat 124 sport coupe
49297722.224490640085.0271117.113384781honda civic 1300
49397817.3442758400193.0473212.956417701hi 1200d
49498022.7395614318139.0252513.294111781ford futura
49598122.7984474140148.0283513.477573821datsun 200-sx
49698335.17364049767.0223417.542681803plymouth valiant
49799417.8254488302220.0277415.177189761triumph tr7 coupe
49899528.545147497150.0213013.324669701datsun pl510
49999736.011880497150.0230015.364361711chevrolet nova